lab test
ExOSITO: Explainable Off-Policy Learning with Side Information for Intensive Care Unit Blood Test Orders
Ji, Zongliang, Amaral, Andre Carlos Kajdacsy-Balla, Goldenberg, Anna, Krishnan, Rahul G.
Ordering a minimal subset of lab tests for patients in the intensive care unit (ICU) can be challenging. Care teams must balance between ensuring the availability of the right information and reducing the clinical burden and costs associated with each lab test order. Most in-patient settings experience frequent over-ordering of lab tests, but are now aiming to reduce this burden on both hospital resources and the environment. This paper develops a novel method that combines off-policy learning with privileged information to identify the optimal set of ICU lab tests to order. Our approach, EXplainable Off-policy learning with Side Information for ICU blood Test Orders (ExOSITO) creates an interpretable assistive tool for clinicians to order lab tests by considering both the observed and predicted future status of each patient. We pose this problem as a causal bandit trained using offline data and a reward function derived from clinically-approved rules; we introduce a novel learning framework that integrates clinical knowledge with observational data to bridge the gap between the optimal and logging policies. The learned policy function provides interpretable clinical information and reduces costs without omitting any vital lab orders, outperforming both a physician's policy and prior approaches to this practical problem.
LabTOP: A Unified Model for Lab Test Outcome Prediction on Electronic Health Records
Im, Sujeong, Oh, Jungwoo, Choi, Edward
KAIST, Republic of Korea Abstract Lab tests are fundamental for diagnosing diseases and monitoring patient conditions. However, frequent testing can be burdensome for patients, and test results may not always be immediately available. To address these challenges, we propose Lab Test Outcome Predictor (LabTOP), a unified model that predicts lab test outcomes by leveraging a language modeling approach on EHR data. Unlike conventional methods that estimate only a subset of lab tests or classify discrete value ranges, LabTOP performs continuous numerical predictions for a diverse range of lab items. We evaluate LabTOP on three publicly available EHR datasets and demonstrate that it outperforms existing methods, including traditional machine learning models and state-of-the-art large language models. We also conduct extensive ablation studies to confirm the effectiveness of our design choices. We believe that LabTOP will serve as an accurate and generalizable framework for lab test outcome prediction, with potential applications in clinical decision support and early detection of critical conditions. Data and Code Availability This paper uses the three EHR datasets, MIMIC-IV (Johnson et al., 2023), eICU (Pollard et al., 2018), and HiRID (Hy-land et al., 2020), which are publicly available on the PhysioNet repository (Johnson et al., 2020; Pollard et al., 2019; Faltys et al., 2021). More details about datasets can be found at Section 4.1. Our implementation code can be accessed at this repository. 1 Institutional Review Board (IRB) This research does not require IRB approval. These authors contributed equally 1. https://anonymous.4open.science/r/LabTOP-DE7B1. Introduction Electronic Health Records (EHR) are essential to modern healthcare systems, serving as comprehensive databases of patient data, including treatments, clinical interventions, and lab test results (Gunter and Terry, 2005). These records provide a longitudinal view of a patient's medical history, allowing for the tracking of individual health trends (Kruse et al., 2017).
DualMAR: Medical-Augmented Representation from Dual-Expertise Perspectives
Hu, Pengfei, Lu, Chang, Wang, Fei, Ning, Yue
Electronic Health Records (EHR) has revolutionized healthcare data management and prediction in the field of AI and machine learning. Accurate predictions of diagnosis and medications significantly mitigate health risks and provide guidance for preventive care. However, EHR driven models often have limited scope on understanding medical-domain knowledge and mostly rely on simple-and-sole ontologies. In addition, due to the missing features and incomplete disease coverage of EHR, most studies only focus on basic analysis on conditions and medication. We propose DualMAR, a framework that enhances EHR prediction tasks through both individual observation data and public knowledge bases. First, we construct a bi-hierarchical Diagnosis Knowledge Graph (KG) using verified public clinical ontologies and augment this KG via Large Language Models (LLMs); Second, we design a new proxy-task learning on lab results in EHR for pretraining, which further enhance KG representation and patient embeddings. By retrieving radial and angular coordinates upon polar space, DualMAR enables accurate predictions based on rich hierarchical and semantic embeddings from KG. Experiments also demonstrate that DualMAR outperforms state-of-the-art models, validating its effectiveness in EHR prediction and KG integration in medical domains.
Lab-AI -- Retrieval-Augmented Language Model for Personalized Lab Test Interpretation in Clinical Medicine
Wang, Xiaoyu, Ouyang, Haoyong, Bhasuran, Balu, Luo, Xiao, Hanna, Karim, Lustria, Mia Liza A., He, Zhe
Accurate interpretation of lab results is crucial in clinical medicine, yet most patient portals use universal normal ranges, ignoring factors like age and gender. This study introduces Lab-AI, an interactive system that offers personalized normal ranges using Retrieval-Augmented Generation (RAG) from credible health sources. Lab-AI has two modules: factor retrieval and normal range retrieval. We tested these on 68 lab tests--30 with conditional factors and 38 without. For tests with factors, normal ranges depend on patient-specific information. Our results show GPT-4-turbo with RAG achieved a 0.95 F1 score for factor retrieval and 0.993 accuracy for normal range retrieval. GPT-4-turbo with RAG outperformed the best non-RAG system by 29.1% in factor retrieval and showed 60.9% and 52.9% improvements in question-level and lab-level performance, respectively, for normal range retrieval. These findings highlight Lab-AI's potential to enhance patient understanding of lab results. Introduction The Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 played a key role in promoting the adoption and meaningful use of electronic health records (EHRs) throughout the U.S. healthcare system. Through the Medicare and Medicaid EHR Incentive Programs, the Act provided financial incentives that facilitated widespread EHR adoption.
Measurement Scheduling for ICU Patients with Offline Reinforcement Learning
Ji, Zongliang, Goldenberg, Anna, Krishnan, Rahul G.
Scheduling laboratory tests for ICU patients presents a significant challenge. Studies show that 20-40% of lab tests ordered in the ICU are redundant and could be eliminated without compromising patient safety. Prior work has leveraged offline reinforcement learning (Offline-RL) to find optimal policies for ordering lab tests based on patient information. However, new ICU patient datasets have since been released, and various advancements have been made in Offline-RL methods. In this study, we first introduce a preprocessing pipeline for the newly-released MIMIC-IV dataset geared toward time-series tasks. We then explore the efficacy of state-of-the-art Offline-RL methods in identifying better policies for ICU patient lab test scheduling. Besides assessing methodological performance, we also discuss the overall suitability and practicality of using Offline-RL frameworks for scheduling laboratory tests in ICU settings.
Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data
Bellamy, David R., Kumar, Bhawesh, Wang, Cindy, Beam, Andrew
Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations. In recent years, self-supervised pre-training of masked language models (MLMs) (see Appendix A for background) has demonstrated remarkable success across a wide range of machine learning problems and has led to significant downstream improvements across diverse tasks in natural language processing (Liu et al., 2019; Devlin et al., 2019; Raffel et al., 2020). There is considerable excitement surrounding the potential of large pre-trained MLMs to achieve similar success in medical applications. For instance, existing applications of MLMs in medicine have already yielded promising results in tasks related to medical text understanding (Lee et al., 2020; Alsentzer et al., 2019; Huang et al., 2019; Yang et al., 2019; Beltagy et al., 2019). Laboratory data is abundant, routinely collected, less biased compared to other types of data in electronic health records (EHRs) like billing codes (Beam et al., 2021), and directly measure a patient's physiological state, offering a valuable opportunity for creating a medical foundation model. However, there is a large body of evidence showing that deep learning is consistently outperformed on so-called "tabular" data prediction tasks by traditional machine learning techniques like random forests, XGBoost, and even simple regression models (Bellamy et al., 2020; Finlayson et al., 2023; Sharma, 2013). The reasons for this are only partially understood, but previous work (Grinsztajn et al., 2022) has suggested that this phenomenon may be caused by a rotational invariance in deep learning models that is harmful for tabular data. More broadly, the success of deep learning is thought to be largely due to inductive biases that can be leveraged for images, text, and graphs. These inductive biases are absent or only weakly present in tabular data. Conversely, tree-based methods are scale invariant and robust to uninformative features. We evaluated both models on several downstream outcome prediction tasks and validated the success of pre-training with a set of intrinsic evaluations.
MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement
Wang, Zifeng, Gao, Chufan, Xiao, Cao, Sun, Jimeng
Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. As such, previous predictors are often trained on manually curated small datasets that struggle to generalize across different tabular datasets during inference. This paper proposes to scale medical tabular data predictors (MediTab) to various tabular inputs with varying features. The method uses a data engine that leverages large language models (LLMs) to consolidate tabular samples to overcome the barrier across tables with distinct schema. It also aligns out-domain data with the target task using a "learn, annotate, and refinement" pipeline. The expanded training data then enables the pre-trained MediTab to infer for arbitrary tabular input in the domain without fine-tuning, resulting in significant improvements over supervised baselines: it reaches an average ranking of 1.57 and 1.00 on 7 patient outcome prediction datasets and 3 trial outcome prediction datasets, respectively. In addition, MediTab exhibits impressive zero-shot performances: it outperforms supervised XGBoost models by 8.9% and 17.2% on average in two prediction tasks, respectively. Tabular data are structured as tables or spreadsheets in a relational database. Each row in the table represents a data sample, while columns represent various feature variables of different types, including categorical, numerical, binary, and textual features. Most previous papers focused on the model design of tabular predictors, mainly by (1) augmenting feature interactions via neural networks (Arik & Pfister, 2021), (2) improving tabular data representation learning by self-supervised pre-training (Yin et al., 2020; Yoon et al., 2020; Bahri et al., 2022), and (3) performing cross-tabular pre-training for transfer learning (Wang & Sun, 2022b; Zhu et al., 2023). Tabular data predictor was also employed in medicine, such as patient health risk prediction (Wang & Sun, 2022b) and clinical trial outcome prediction (Fu et al., 2022). Additionally, LLMs have been shown to be able to sample synthetic and yet highly realistic tabular data as well Borisov et al. (2022); Theodorou et al. (2023).
Evaluation of Embeddings of Laboratory Test Codes for Patients at a Cancer Center
Rossi, Lorenzo A., Shawber, Chad, Munu, Janet, Zachariah, Finly
Laboratory test results are an important and generally highly dimensional component of a patient's Electronic Health Record (EHR). We train embedding representations (via Word2Vec and GloVe) for LOINC codes of laboratory tests from the EHRs of about 80,000 patients at a cancer center. To include information about lab test outcomes, we also train embeddings on the concatenation of a LOINC code with a symbol indicating normality or abnormality of the result. We observe generally clinically meaningful similarities among LOINC embeddings trained over our data. For the embeddings of the concatenation of LOINCs with abnormality codes, we evaluate the predictive performance for mortality prediction tasks and the ability to preserve ordinality properties: i.e. a lab test with normal outcome should be more similar to an abnormal one than to the a very abnormal one.
Explaining an increase in predicted risk for clinical alerts
Hardt, Michaela, Rajkomar, Alvin, Flores, Gerardo, Dai, Andrew, Howell, Michael, Corrado, Greg, Cui, Claire, Hardt, Moritz
Much work aims to explain a model's prediction on a static input. We consider explanations in a temporal setting where a stateful dynamical model produces a sequence of risk estimates given an input at each time step. When the estimated risk increases, the goal of the explanation is to attribute the increase to a few relevant inputs from the past. While our formal setup and techniques are general, we carry out an in-depth case study in a clinical setting. The goal here is to alert a clinician when a patient's risk of deterioration rises. The clinician then has to decide whether to intervene and adjust the treatment. Given a potentially long sequence of new events since she last saw the patient, a concise explanation helps her to quickly triage the alert. We develop methods to lift static attribution techniques to the dynamical setting, where we identify and address challenges specific to dynamics. We then experimentally assess the utility of different explanations of clinical alerts through expert evaluation.